Reusable transformations of Data Cube Vocabulary datasets from the fiscal domain
نویسندگان
چکیده
Shared data models provide leverage for reusable data transformations. Common modelling patterns and data structures can make data transformations applicable to diverse datasets. Similarly to data models, reusable data transformations promote separation of concerns, prevent duplication of effort, and reduce the time spent processing data. However, unlike data models, which can be shared as RDF vocabularies or ontologies, there is no well-established way of sharing data transformations. We propose a way to share data transformations as ‘pipeline fragments’ for LinkedPipes ETL (LP-ETL), which is an RDFbased data processing tool focused on RDF data. We describe the features of LP-ETL that enable development of reusable transformations as pipeline fragments. Pipeline fragments are represented in RDF as JSON-LD files that can be shared directly or via dereferenceable IRIs. We demonstrate the use of pipeline fragments on data transformations for fiscal data described by the Data Cube Vocabulary (DCV). We cover both generic transformations for any DCV-compliant data, such as DCV validation or DCV to CSV conversion, and transformations specific for the fiscal data used in the OpenBudgets.eu (OBEU) project, including conversion of Fiscal Data Package to RDF or normalization of monetary values. The applicability of these transformations is shown on concrete use cases serving the goals of the OBEU project.
منابع مشابه
Modeling fiscal data with the Data Cube Vocabulary
We present a fiscal data model based on the Data Cube Vocabulary, which we developed for the OpenBudgets.eu project. The model defines component properties out of which data structure definitions for concrete datasets can be composed. Based on initial usage experiments, simple validation constraints have been formulated.
متن کاملQuerying the Global Cube: Integration of Multidimensional Datasets from the Web
National statistical indicators such as the Gross Domestic Product per Capita are published on the Web by various organisations such as Eurostat, the World Bank and the International Monetary Fund. Uniform access to such statistics will allow for elaborate analysis and visualisations. Though many datasets are also available as Linked Data, heterogeneities remain since publishers use several ide...
متن کاملTowards Budget Comparative Analysis: the need for Fiscal Codelists as Linked Data
Code lists are a key part of budget datasets as they serve for the coding of fiscal concepts within them. However, the great diversity of classifications across countries and concepts does not allow to presume upon their actual value, as dimension properties. In this paper we discuss the need for creating code lists Linked Data for the classifications used in fiscal datasets, in three basic ste...
متن کاملDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
The RDF Data Cube vocabulary is a catalyst for the availability of statistical Linked Data: raw statistical Linked Data are easy to model in, publish to, and retrieve from the Linked Data cloud. In statistical datasets, concepts are central entities represented by variables and their values. The meaning of these concepts is often assumed to be stable, but in fact it can change over time: we cal...
متن کاملVisualizing RDF Data Cubes Using the Linked Data Visualization Model
Data Cube represents one of the basic means for storing, processing and analyzing statistical data. Recently, the RDF Data Cube Vocabulary became a W3C recommendation and at the same time interesting datasets using it started to appear. Along with them appeared the need for compatible visualization tools. The Linked Data Visualisation Model is a formalism focused on this area and is implemented...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016